[Badger Batch] Insert approvals with badger batch update #6381

zhangchiqing · 2024-08-21T00:52:21Z

Before migrating to pebble based storage, we first refactor the existing badger based storage to use batch updates instead of transaction, so that the database operations are similar to how pebble stores data.

Referred to #6374 for ensuring concurrency-safety.

The following concurrency tests passed:

go test --failfast --tags=relic  -run=TestApprovalStoreTwoDifferentApprovalsConcurrently -count=100
PASS
ok      github.com/onflow/flow-go/storage/badger        5.818s

codecov-commenter · 2024-08-21T00:58:55Z

Codecov Report

Attention: Patch coverage is 38.67403% with 111 lines in your changes missing coverage. Please review.

Project coverage is 41.49%. Comparing base (400bcd0) to head (ddd9404).

Files	Patch %	Lines
storage/badger/operation/reader_batch_writer.go	0.00%	53 Missing ⚠️
storage/badger/cache_b.go	59.09%	26 Missing and 1 partial ⚠️
storage/badger/operation/common.go	0.00%	19 Missing ⚠️
storage/badger/operation/approvals.go	0.00%	8 Missing ⚠️
storage/badger/approvals.go	85.71%	2 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6381      +/-   ##
==========================================
- Coverage   41.50%   41.49%   -0.01%     
==========================================
  Files        2013     2016       +3     
  Lines      143577   143718     +141     
==========================================
+ Hits        59590    59636      +46     
- Misses      77813    77906      +93     
- Partials     6174     6176       +2

Flag	Coverage Δ
unittests	`41.49% <38.67%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

zhangchiqing · 2024-08-22T16:46:48Z

storage/badger/operation/reader_batch_writer.go

+}
+
+func (b *ReaderBatchWriter) DeleteRange(start, end []byte) error {
+	// TODO: implement


This is a big challenging to implement concurrent-safe, will implement in a separate PR. As this module doesn't need it yet, will add along with modules that need it. For now I'm leaving it as a TODO item.

I'm wondering if we even need to implement it? I don't see any usage of DeleteRange in our code-base right now.

I assume we will want to add them later. But maybe we can:

do the migration with the common writer interface omitting DeleteRange

when the migration is complete, re-add DeleteRange to the writer interface (then we don't need to implement it with Badger)

OK, I can remove it for now.

We need the DeleteRange method for pebble to remove all events, transactions etc for a given block when rolling back executed height.

I used Pebble's DeleteRange method as it can iterate range of keys with certain prefix and delete them atomically.

storage/badger/operation/reader_batch_writer.go

jordanschalm · 2024-08-22T19:28:52Z

storage/badger/operation/reader_batch_writer.go

+	batch *badger.WriteBatch
+
+	addingCallback sync.Mutex // protect callbacks
+	callbacks      []func(error)


Once we're done prototyping and align on a structure we expect to be long-lived (maybe now?), we should document these callback functions, for example:

The error input is the error returned from batch.Commit

The callback is called regardless of whether the batch was successfully committed (callbacks are responsible for verifying this by checking the error input)

Callbacks must be non-blocking

jordanschalm · 2024-08-22T19:44:10Z

storage/badger/operation/reader_batch_writer.go

+
+var _ storage.BadgerReaderBatchWriter = (*ReaderBatchWriter)(nil)
+
+func (b *ReaderBatchWriter) ReaderWriter() (storage.Reader, storage.Writer) {


Realistically we will need to keep this structure where the ReaderBatchWriter passed to lower-level storage methods can both read and write, even though it doesn't fit well conceptually with Pebble. I think it would be worthwhile to make the interface naming and documentation clearly communicate the fact that the reader is reading globally from committed database state.

I also feel it would be better to separate the method to access reader and writer:

we want the design to discourage intermingling reads and writes, and communicate the fact that reads and writes are not being applied to the same snapshot of state like they were with badger.

having one return value allows chaining (eg. batch.Reader().Get(...))

separating the functions allows us to distinctly document each one (since the reader and writer are fairly conceptually different)

// GlobalReader returns a database-backed reader which reads the latest committed global database state ("read-committed isolation"). // This reader will not read writes written to ReaderBatchWriter.Writer until the write batch is committed. // This reader may observe different values for the same key on subsequent reads. func (b *ReaderBatchWriter) GlobalReader() storage.Reader { ... } // Writer returns a writer associated with a batch of writes. The batch is pending until it is committed. // When we `Write` into the batch, that write operation is added to the pending batch, but not committed. // The commit operation is atomic w.r.t. the batch; either all writes are applied to the database, or no writes are. func (b *ReaderBatchWriter) Writer() storage.Writer { ... }

Makes sense.

I originally combined them because there is a few places need intermingling reads. But I think we probably are able to refactor those logic, such as bootstrapping a sealing segment.

zhangchiqing · 2024-08-23T00:30:02Z

storage/badger/operation/reader_batch_writer.go

+	return b.batch
+}
+
+func (b *ReaderBatchWriter) AddCallback(callback func(error)) {


Implemented storage.OnCommitSucceed.

storage/badger/approvals.go

jordanschalm · 2024-08-22T20:09:55Z

storage/badger/cache_b.go

+// injected. During normal operations, the following error returns are expected:
+//   - `storage.ErrNotFound` if key is unknown.
+func (c *CacheB[K, V]) Get(key K) func(storage.Reader) (V, error) {
+	return func(tx storage.Reader) (V, error) {


Suggested change

return func(tx storage.Reader) (V, error) {

return func(reader storage.Reader) (V, error) {

Far from the highest priority, but it would be great if we could replace "transaction" terminology with "reader" or "writer" terminology where we notice it in new code.

storage/badger/operation/common.go

@@ -44,6 +44,22 @@ func batchWrite(key []byte, entity interface{}) func(writeBatch *badger.WriteBat
 	}
 }

+func insertW(key []byte, val interface{}) func(storage.Writer) error {


storage/badger/operation/reader_batch_writer.go

jordanschalm

This looks great as a starting point for implementing read-committed concurrency safety against Badger. Nice work.

Added a few final comments:

Before merging this, please add documentation to exported methods and types.
I added a suggestion for a pattern to communicate which low-level storage operations must be synchronized by a higher-level lock.

jordanschalm · 2024-08-23T15:22:39Z

storage/badger/operation/approvals.go

-func RetrieveResultApproval(approvalID flow.Identifier, approval *flow.ResultApproval) func(*badger.Txn) error {
-	return retrieve(makePrefix(codeResultApproval, approvalID), approval)
+func RetrieveResultApproval(approvalID flow.Identifier, approval *flow.ResultApproval) func(storage.Reader) error {
+	return retrieveR(makePrefix(codeResultApproval, approvalID), approval)
 }

 // IndexResultApproval inserts a ResultApproval ID keyed by ExecutionResult ID
 // and chunk index. If a value for this key exists, a storage.ErrAlreadyExists
 // error is returned. This operation is only used by the ResultApprovals store,
 // which is only used within a Verification node, where it is assumed that there
 // is only one approval per chunk.


Suggested change

// is only one approval per chunk.

// is only one approval per chunk.

// CAUTION: Use of this function must be synchronized by storage.ResultApprovals.

I'd like to communicate which low-level functions must be synchronized by a higher-level storage procedure. Adding documentation and using Unsafe... naming seems like a relatively easy way to do that.

jordanschalm · 2024-08-23T15:22:49Z

storage/badger/operation/approvals.go

 }

 // IndexResultApproval inserts a ResultApproval ID keyed by ExecutionResult ID
 // and chunk index. If a value for this key exists, a storage.ErrAlreadyExists
 // error is returned. This operation is only used by the ResultApprovals store,
 // which is only used within a Verification node, where it is assumed that there
 // is only one approval per chunk.
-func IndexResultApproval(resultID flow.Identifier, chunkIndex uint64, approvalID flow.Identifier) func(*badger.Txn) error {
-	return insert(makePrefix(codeIndexResultApprovalByChunk, resultID, chunkIndex), approvalID)
+func IndexResultApproval(resultID flow.Identifier, chunkIndex uint64, approvalID flow.Identifier) func(storage.Writer) error {


Suggested change

func IndexResultApproval(resultID flow.Identifier, chunkIndex uint64, approvalID flow.Identifier) func(storage.Writer) error {

func UnsafeIndexResultApproval(resultID flow.Identifier, chunkIndex uint64, approvalID flow.Identifier) func(storage.Writer) error {

jordanschalm · 2024-08-23T15:25:13Z

storage/badger/operation/reader_batch_writer.go

+
+var _ storage.BadgerReaderBatchWriter = (*ReaderBatchWriter)(nil)
+
+func (b *ReaderBatchWriter) GlobalReader() storage.Reader {


Before merging, please add documentation for these and other public methods. I wrote a version in this comment for the reader/writer methods: #6381 (comment).

jordanschalm

Approving to unblock as I'll be away next week and I don't think my outstanding comments require further discussion. I think this is broadly a good starting point for the plan to implement read-committed concurrency safety against Badger.

I'm happy to merge this once remaining comments have been addressed (mainly documentation), and we make the decision to go ahead with this plan to implement concurrency safety changes against Badger.

Co-authored-by: Jordan Schalm <[email protected]>

zhangchiqing · 2024-09-14T00:26:30Z

Close for now. Found a better way to refactor.

zhangchiqing · 2024-09-14T01:18:24Z

In favor of #6466

zhangchiqing force-pushed the leo/badger-batch-approvals branch from 680aa94 to d3ec64f Compare August 21, 2024 00:52

zhangchiqing requested review from jordanschalm and AlexHentschel August 21, 2024 03:02

zhangchiqing marked this pull request as ready for review August 21, 2024 03:02

zhangchiqing commented Aug 22, 2024

View reviewed changes

zhangchiqing force-pushed the leo/badger-batch-approvals branch from 816b288 to 2cc8232 Compare August 22, 2024 16:47

jordanschalm reviewed Aug 22, 2024

View reviewed changes

jordanschalm reviewed Aug 23, 2024

View reviewed changes

jordanschalm approved these changes Aug 23, 2024

View reviewed changes

zhangchiqing and others added 10 commits August 23, 2024 10:21

insert approvals with badger batch update

02df161

add concurrent tests

9470345

rename

32bb86b

Apply suggestions from code review

5362f46

Co-authored-by: Jordan Schalm <[email protected]>

update comments

bfa51ec

add OnCommitSucceed

3c6e797

remove DeleteRange

0437ff8

replace ReaderWriter with GlobalReader and Writer

f99eea7

refactor ToReader

300119d

addressing review comments

ddd9404

zhangchiqing force-pushed the leo/badger-batch-approvals branch from 1e03a2d to ddd9404 Compare August 23, 2024 17:21

zhangchiqing requested a review from durkmurder September 3, 2024 16:52

zhangchiqing closed this Sep 14, 2024

zhangchiqing mentioned this pull request Sep 14, 2024

[Badger] Refactor approvals to use badger batch updates #6466

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Badger Batch] Insert approvals with badger batch update #6381

[Badger Batch] Insert approvals with badger batch update #6381

zhangchiqing commented Aug 21, 2024 •

edited

Loading

codecov-commenter commented Aug 21, 2024 •

edited

Loading

zhangchiqing Aug 22, 2024 •

edited

Loading

jordanschalm Aug 22, 2024 •

edited

Loading

zhangchiqing Aug 22, 2024

jordanschalm Aug 22, 2024

jordanschalm Aug 22, 2024

zhangchiqing Aug 23, 2024

This comment was marked as resolved.

zhangchiqing Aug 23, 2024

jordanschalm Aug 22, 2024

This comment was marked as resolved.

jordanschalm left a comment

jordanschalm Aug 23, 2024

jordanschalm Aug 23, 2024

jordanschalm Aug 23, 2024

jordanschalm left a comment

zhangchiqing commented Sep 14, 2024

zhangchiqing commented Sep 14, 2024


		var _ storage.BadgerReaderBatchWriter = (*ReaderBatchWriter)(nil)

		func (b *ReaderBatchWriter) ReaderWriter() (storage.Reader, storage.Writer) {

	return func(tx storage.Reader) (V, error) {
	return func(reader storage.Reader) (V, error) {

	// is only one approval per chunk.
	// is only one approval per chunk.
	// CAUTION: Use of this function must be synchronized by storage.ResultApprovals.

	func IndexResultApproval(resultID flow.Identifier, chunkIndex uint64, approvalID flow.Identifier) func(storage.Writer) error {
	func UnsafeIndexResultApproval(resultID flow.Identifier, chunkIndex uint64, approvalID flow.Identifier) func(storage.Writer) error {


		var _ storage.BadgerReaderBatchWriter = (*ReaderBatchWriter)(nil)

		func (b *ReaderBatchWriter) GlobalReader() storage.Reader {

[Badger Batch] Insert approvals with badger batch update #6381

[Badger Batch] Insert approvals with badger batch update #6381

Conversation

zhangchiqing commented Aug 21, 2024 • edited Loading

codecov-commenter commented Aug 21, 2024 • edited Loading

Codecov Report

zhangchiqing Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

jordanschalm Aug 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment was marked as resolved.

jordanschalm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jordanschalm left a comment

Choose a reason for hiding this comment

zhangchiqing commented Sep 14, 2024

zhangchiqing commented Sep 14, 2024

zhangchiqing commented Aug 21, 2024 •

edited

Loading

codecov-commenter commented Aug 21, 2024 •

edited

Loading

zhangchiqing Aug 22, 2024 •

edited

Loading

jordanschalm Aug 22, 2024 •

edited

Loading